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ABSTRACT 

Increasingly, machine learning is entertained as a 
mechanism for improving the efficiency of planning systems. Research 
in this area has generated an impressive battery of techniques and a 
growing body of empirical successes. Unfortunately the formal 
properties of these systems are not well understood. This is 
highlighted by a growing corpus of demonstrations where learning 
actually degrades planning performance. In this paper we view 
learning to plan as a search problem. Learning is seen as a 
transformational process where a planner is tailored to a particular 
domain and problem distribution. To accomplish this task, learning 
systems draw from a vocabulary of transformation operators such as 
macro-operators or control rules. These "learning operators 1 ' define a 
space oi possible transformations through which a system must search 
for an efficient planner. This study shows that the complexity of 
this search precludes a general solution and can only be approached 
via simplifications. (Frequently unarticulated commitments which 
underlie current learning approaches are illustrated.) These 
simplifications improve learning efficiency but not without 
tradeoffs. In some cases these tradeoffs result in less than optimal 
behavior. In others, they produce planners which become worse Mirough 
learning. It is hoped that by articulating these commitments we can 
better understand their ramifications can be better understood. 
Final ly, a particular learning technique— COMPOSER — is discussed 
which explicitly utilizes these simplifications to ensure performance 
improvements with reasonable efficiency. (Contains 34 references.) 
(Author/ALF) 
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Abstract 

Increasingly, machine learning is entertained as a mechanism for improving the efficiency of plan- 
ning systems. Research in this area has generated an impressive battery of techniques and a growing 
body of empirical successes, Unfortunately the formal properties of these systems are not well un- 
derstood. This is highlighted by a growing corpus of demonstrations where learning actually de- 
grades planning performance. In this paper we view learning to plan as a search problem. Learning 
is seen as a transformational process where a planner is tailored to a particular domain and problem 
distribution. To accomplish this task, learning systems draw from a vocabulary of transformation 
operators such as macro-operators or control rules. These "learning operators" define a space of 
possible transformations through which a system must search for a efficient planner. We show that 
the complexity of this search precludes a general solution and can only be approached via simplifica- 
tions. We illustrate the frequently unarticulated commitments which underly current learning ap- 
proaches. These simplifications improve learning efficiency but not without tradeoffs. In some 
cases these tradeoffs result in less than optimal behavior. In others, they produce planners which 
become worse through learning. It is hoped that by articulating these commitments we can better 
understand their ramifications. Finally, we discusses a particular learning technique which explicitly 
utilizes these simplifications to ensures performance improvements with reasonable efficiency. 
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1 INTRODUCTION 

In machine learning there is considerable interest in techniques which improve planning ability. In- 
vestigation in this area has identified a wide array of techniques including macro-operators 
[DeJong86, Fikes72, Mitchell86], chunks CLaird86], and control rules [Minton88, Mitchell83]. 
With these techniques comes a growing battery of successful demonstrations in domains ranging 
from 8-puzzle to Space Shuttle payload processing. Unfortunately, the formal properties of these 
techniques are not well understood This is highlighted by a growing body of demonstrations where 
learning degrades planning performance [Etzioni90a, Gratch91b, Minton85, Subramanian90], 

In this paper we will develop a view of learning as search and compare alternate learning strategies 
against this common perspective. We will show that the complexity of general task of improving 
a planner must be approached by imposing simplifications. Section 3 explores the frequently unar- 
ticulated commitments which underly learning approaches. These simplifications improve learning 
efficiency but not without tradeoffs. In some cases these tradeoffs result in less than optimal behav- 
ior. In others, they lead to planners which grow worse through learning. It is hoped that by articulat- 
ing these commitments we can better understand their ramificat ; ons. 

Section 4 discusses a particular method which ensures performance improvements with reasonable 
efficiency. The COMPOSER system is an extension of the PRODIGY/EBL approach to speed-up 
learning. Its contribution is a rigorous form of utility analysis which provides probabilistic guaran- 
tees of improvement. 

2 ADEQUATE LEARNING 

A learning system should take a particular planning system operating within a particular domain and 
tailor it to more effectively solve problems. This can be viewed as a transformational process where 
a series of transformations are applied to the original problem solver (see [Gratch90b, Greiner92]). 
A planner may be transformed by the addition of control knowledge. Different forms of control 
knowledge include macro-operators [Braverman88, Laird86, Markovitch89], control rules [Drum- 
mond89, Etzioni90a, Minton88, Mitchell83], and static board evaluation functions [Utgoff91]. Al- 
ternatively, a learning system may modify the domain theory. Such transformations could be truth 
(accuracy) preserving (as in conjunct reordering or deletion of redundancy [Smith85], or non-truth 
preserving (as in theory revision tasks [Richards9l, Towell90]). 
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The transformations available to a learner define its vocabulary of transformations. These are essen- 
tially learning operators and collectively they define a transformation space. For instance, acquiring 
a macro-operator can be viewed as transforming the initial state (the original planner) into a new 
state (the planner operating with the macro-operator), A learning system must search this space for 
a sequence of transformation which result in a better planner. 

First, we will precisely define what makes a learning system adequate. Given an initial planner, Po, 
a learning system is minimally adequate if it: 1) halts without making any changes, or 2) applies a 
sequence of transformations which yields a new planner P new which is preferred to P 0 id (i.e. if it does 
anything, it produces a better planner). One learning system is more adequate (better) than another 
if it produces preferred planners. A system which identifies the most preferred planner in the trans- 
formation space (there may be more than one) is optimal. To make this notion concrete we must 
define a preference function over planners. 

There are many ways to assign preferences. For this paper we view preference in terms of a numeric 
utility function which ranks differently transformed planners according to our intuitive notions of 
preferences. In particular, we will consider the case where a utility value can be assigned to a plan- 
ner 's behavior for any given problem. The utility of a planner is then defined with respect to a partic- 
ular problem distribution as the sum of problem utilities weighted by the probability of each prob- 
lem. A similar definition appears in [Greiner92]« This covers the obvious measures of effectiveness. 
For example, if we are interested in minimizing problem solving cost, problem utility increases as 
the cost required to solve that problem decreases. The utility of the planner is then: 

UTlUTY(planner^ » - Cosi(planner h prob) X Priprob) 

prvb ^Distribution 

Utility is a preference function which ranks different planners. But the learning algorithm must 
search through alternative transformations. We need a way to convert preferences over planners into 
preferences over transformations. The utility of a transformation is the change in utility that results 
from applying the transformation to a particular planner. This means the utility of a transformation 
is conditional on the planner to which it is applied. We denote this as: AUTILITY(TransformationlPlan- 
ner). Applying a transformation with positive utility results in a more effective planner. A learning 
system need not explicitly compute utility values to identify preferred planners, but it must act (at 
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least approximately) as if it does. In fact, most learning systems do not explicitly make utility deter- 
minations. 

3 CHALLENGES AND DIRECTIONS 

We are presenting a view of learning as search. A learning system must move through the transfor- 
mation space in search of preferred planners. To accomplish this task it must identify transforma- 
tions with positive utility. Typically, transformations are proposed in response to problem solving 
success or impasses, and their benefit is estimated (if at all) from future problem solving episodes. 

There are three basic challenges to adequate learning: 1) the space of possible transformations is 
too large, 2) it is expensive to reliably identify good transformations, and 3) it may be infeasible to 
solve even a single problem with the initial planner — confounding the processes of transformation 
proposal and validation. These difficulties suggest there there does not exist a general solution to 
this task. In spite of this, there are a number of published techniques which claim to work well. The 
remainder of this section resolves this contradiction by illustrating how learning techniques make 
simplifications to address each challenge. 

In very few cases are simplifications explicit in the published reports of these works. Thus, it is diffi- 
cult to determine the precise simplifications an algorithm embodies. Even if elucidated, these may 
not useful in the sense that it would be hard to instantiate diem in a different algorithm. Instead, we 
present concise and obvious simplifications to each challenge. We then argue how different tech- 
niques are best viewed as approximating these commitments. Figure 1 summarizes the approaches 
to be discussed. This is intended to be representative of the approaches which are popular in the 
literature and is not meant to be exhaustive. The discussion is organized into approaches for each 
of the three basic challenges. Unless otherwise noted, solutions to one challenge are independent 
of the solutions to other challenges. There are frequently relationships between the approaches with- 
in each challenge, and these will be noted in the text. 

3.1 Transformation Space Complexity 

In this section we presume that the utility of transformations can be determined The challenge is 
to efficiently find an appropriate sequence of transformation. Unfortunately, there are many ways 
to change the initial planner. We can consider all possible subsets of transformations. Furthermore, 
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Figure 1: Summary of simplifying approaches 

as several authors have noted, the order of these transformations is often important (e.g. very differ- 
ent behavior can result from changes in the order in which macro operators are evaluated). 

For a set of n transformations there are 0(nn!) distinguishable changes (any planner may be the result 
of up to n c dered transformations). Even n may be excessively large. For example, Etzioni provides 
a bound on the number of control rules entertained by his STATIC system which is exponential in 
the number of predicates in the domain theory [Etzioni90b pp, 17 1-174]. If transformations involve 
continuous parameters (e.g. modifications to the real-valued parameters of a static board evaluator), 
the space may be uncountably infinite. 

An unconstrained algorithm will consider all tint alternatives. This is rarely feasible. However, 
there are a number of simplifications which can reduce the space. We discuss three which are ortho- 
gonal and many system adopt more than one. The first approach is to avoid searching all possible 
orderings for transformations,, 

AVOIDING PERMUTATIONS (Tl) 

With this commitment, the number of distinguishable planners is reduced from 0(nn!) to 0(2 n ). In 
practice this means that if an algorithm is considering a new transformation to adopt, it will not con- 
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sider all possible places to insert the transformation into the existing sequence. In some cases the 
order is resolved in a particular way. In the case of macro-operators, Shavlik suggests one ordering 
scheme: order the library of macro-operators such that each newly learned macro-operators is 
placed before the original domain theory but after previously learned macro-operators [Shavlik88]. 
Minton's PRODIGY/EBL system arbitrarily maintains control rules in the order they were learned 
[Minton88 p. 79]. In both cases the authors acknowledge that different ordering schemes result in 
different planning behavior. 

Legislating a particular ordering will avoid many alternatives. When we can demonstrate that the 
order of transformations is irrelevant, or that the best ordering can be determined without search, 
this simplification can be adopted without cost. Without such guarantees, the system may not find 
preferred planners when they exist Thus it may preclude optimality or otherwise lower the adequa- 
cy of the learning system. On the other hand, a system utilizing this simplification retains minimally 
adequacy, as it does not effect the reliability of utility values. Some systems which ignore alternative 
permutations are IMEX [Braverman88], COMPOSER [Gratch91a], STATIC [Etzioni90a], BAGGER 
[Shavlik88], and PRODIGY/EBL [Minton88]. We are not aware of a technique which does not adopt 
this assumption. 

While Tl reduces complexity, it is generally insufficient. One of the difficulties is that even with 
ordering constraints there are a large number of conditional utilities: AUTILlTY(TilP), AUTIL- 
lTY(TilT 2 P), AUnLITY(TilT 2 T3P), etc. A powerful simplification is to treat the utility of a trans- 
formation as independent of other transformations. 

INDEPENDENCE: VTi\Tj f bJJTIUTY(Ti\TjP) = LUTILITY(Ti\P) (T2) 

Under this simplification a learning system can evaluate each transformation once, without regard 
to context of other transformations. This allows a local decision procedure for adopting transforma- 
tions. Each transformation is evaluated without regard for other adopted transformations. If the 
transformation is beneficial, adopt it, otherwise discard it. The complexity of search is reduced to 
0(n). 

This assumption leads to adequate learning under certain sufficient conditions. If the transforma- 
tions are in fact independent, a learning system can discover the optimal transformation sequence 
in 0(n). If this condition cannot be guaranteed, weaker conditions are sufficient to display at least 
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minimal adequacy. A learning system can ignore dependencies as long as negative interactions are 
not overwhelming. Aji example of an overwhelming negative interaction would be the case where, 
individually, two transformations benefit the planner but, collectively, they hurt performance. 

Many speed-up learning systems adopt transformations using decision procedures which do not ex- 
plicitly consider interactions (including STATIC [Etzioni90a], PRODIGY/EBL, RECEBG [Letov- 
sky90], IMEX, and PEBL [Eskey90]). Ma and Wilkins illustrate a similar situation for knowledge- 
base revision systems [Wilkins89]. Systems which do not adopt this assumption include PALO 
[Greiner92], COMPOSER, and [Leckie91]. 

Unfortunately, accounts of systems which ignore interactions do not characterize when this simplifi- 
cation is appropriate. Furthermore, it is becoming clear that the transformations utilized by these 
techniques do interact. In [Gratch90b] and [Gratch91b] we demonstrate how macro operators and 
control rules produce harmful interactions. In the later article we illustrate a simple domain which 
exhibits strong negative interactions. When tested on this domain, the utility analysis technique of 
PRODIGY/EBL and the nonrecursive hypothesis of STATIC generate planners several times worse 
than the initial planner. These results show this simplification can produce inadequate performance. 
They also highlight why more emphasis must be placed on explicating and providing sufficient con- 
ditions for the simplifications which underlie learning techniques. 

A final simplification is to avoid exploring the entire search space: 

INCOMPLETE/UNRECOVERABLE SEARCH (T3) 

There exists a wide array of search techniques to deal with complexity. For example, an unrecover- 
able technique like hillclimbing can reduce the complexity to Oin 2 ) (at most n choices at each step 
for at most n steps). The number of choices can be further reduced by only considering a subset of 
the n possible transformations. In either case these simplifications retain minimal adequacy at the 
cost of reduced learning opportunities. A hill-climbing technique which is stuck at a local maxima 
may produce a planner which is less adequate than a more comprehensive learner. 

Many learning techniques consider only a subset of the legal transformations. For example, SOAR 
does not entertain all possible chunks, but only those learned in response to impasses. Other tech- 
niques are explicit hill-climbers which terminate search when a local maxima is reached [Grein- 
er92] or when the training set is exhausted [Gratch91a]. 
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3.2 Estimation Complexity 

The preceding discussion assumed that utility values were available to the learning system. Unfortu- 
nately this is rarely the case. The utility of a transformation depends on information which is fre- 
quently unavailable like the distribution of future problems. The natural approach is to estimate util- 
ity from training examples. 

The simplest approach to estimation is estimation by brute force. Execute the planner over all prob- 
lems in the expected test set, observing its behavior. Then make a transformation and rerun the plan- 
ner on the same set of problems. Ur , the difference between the two runs to estimate the utility of 
the transformation. As the test set may be large, unavailable, or infinite, and there may be many 
transformations from which to choose from, this approach is impractical. Many techniques are 
thriftier in their use of examples. They embody the following simplification: 



This means that the technique extracts information about multiple transformations from each exam- 
ple. PRODIGY/EBL, COMPOSER, SYLLOG [Markovitch89], and PALO [Greiner92] perform si- 
multaneous estimation. This simplification is justified if the estimate for one transformation is not 
influenced by whatever other transformations are being estimated. This condition is easily verified 
empirically, and in some systems the simplification is unjustified. For example, PRODIGY/EBL 
gathers statistics for control rules as other rules are learned and forgotten. These shifting conditions 
influence the estimates. In [Gratch91b] we illustrate a domain where these influences lead to learn- 
ing behavior which is not minimally adequate. COMPOSER and PALO were designed to explicitly 
enable simultaneous estimation. 

The primary difficulty, however, is that we typically cannot feasibly evaluate a transformation over 
all future problems. Instead we must estimate utility from a subset of the distribution. Approaches 
to this problem vary in the number of examples required to produce their estimates. By definition, 
estimates are approximations to reality. A transformation which is estimated to be good may in fact 
have negative utility. Many approaches to not attempt to quantify or bound the this type of error. 



Techniques which derive utility estimates from examples but do not reason about the accuracy of 
those estimates including PRODIGY/EBL, SYLLOG, and [Leckie91]. The decision of how many 
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examples are necessary is left in the hands of the user and no guarantees are provided for the accuracy 
of the resulting estimates. It may be difficult in these cases to provide sufficient conditions for when 
the probability of error is acceptably low. The complement of E2 is: 

QUANTIFIED /BOUNDED ERROR (E3) 

Statistical methods can bound the probability of mistakes. Error can be reduced by taking more ex- 
amples. In fact, given a pre-specifled level of error, there are techniques which can draw sufficient 
examples to reach this error level (assuming the method is appropriate). The guarantee is gained 
at the cost of many training examples, but Greinev and Cohen show how (under weak assumptions) 
to achieve an arbitrary level of confidence with polynomial examples [Greiner92]. Different statisti- 
cal models require differing amounts of data. There is a large body of work in statistics which de- 
scribes these alternative methods and their sufficient conditions. 

Finally, the most demanding simplification is: 

LEARNING WITHOUT EXAMPLES (E4) 

If we can identify transformations which reduce the problem solving cost of all possible problems, 
we can learn without data. This is similar to the notion of dominance in [Wellman90], If this proper- 
ty can be demonstrated, a transformation will yield an improved planner independent of the problem 
distribution. Such transformations do exist. For example, a planner can only benefit from deleting 
an unsatisfiable or redundant operator. The STATIC system is grounded in the principle that its set 
of transformations will increase utility independent of distribution [Etzioni90b p. 134]. Unfortu- 
nately, the STATIC approach does not quantify or bound its accuracy and it is difficult to demonstrate 
conditions when the accuracy is acceptable. Etzioni illustrates some domains where STATIC is ade- 
quate butnegative results appear in [Gmtch91b]. But whenit is appropriate, learning without exam- 
ple is a powerful simplification as it precludes the challenge of observation complexity to which we 
now turn. 

3.3 Observation Complexity 

Learning techniques must observe data from the task environment to estimate utility values. Besides 
ths distribution of problems, this data often involves properties of the planner's state space, or prop- 
erties of solution paths. The most common way to gather this data is to solve problems with the 
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rently transformed planner. Problems can be randomly drawn from the distribution and solution 
their traces examined. 
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But planning is not tractable in general. Thus, it may not be feasible to solve even one problem with 
toe initial planner. Learning from self-solutions equates observation complexity with the complex- 
ity of the initial planner in the domain of interest. If this approach is to be feasible, this complexity 
must be sufficiently small. While this condition is rarely articulated, it is implicit in a wide range 
of learning techniques [Braverman88, Laird86, Minton85, Mitchell83, Ruby91]. 

One might ask why we need to learn at all if problem solving is already feasible. However, in many 
circumstances this quite reasonable. For instance, in situations where large numbers of problems 
must be solved, small increases in efficiency can result in huge savings. 

If learning from self-solutions is impractical, the alternative approach is to: 



Some techniques require a teacher to provide the appropriate data. This places the observation com- 
plexity in the hands of the teacher. The teacher, though its superior knowledge, presumably can elicit 
the data in reasonable time. For example of this approach, T&depalli demonstrates that when do- 
mains are serially decomposable, a system can generate a polynomial-time problem solver by learn- 
ing a body of control knowledge called a macro-table [Tadepalli91], To learn this table the system 
requires a teacher which knows the macro-table we wish to learn. This teacher then provides the 
learning system with solutions generated according to the table. Natarajan introduces a technique 
which requires the teacher to provide optimal solution paths [Natarajan89]. 



An intriguing alternative is to learn to solve intractable problems by training on simpler, tractable 
problems. This is analogous to classroom learning were a carefully selected set of "text book" prob- 
lems should lead to sophisticated problem solving ability. Natarajan demonstrates some sufficient 
conditions under which this type of learning is possible [Natarajan89]. 
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LEARNING FROM SIMPLER PROBLEMS 
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When it is not feasible to completely solve a. problem, useful information may still be gleaned from 
partial solutions: 

LEARNING FROM PARTIAL SOLUTIONS (04) 

Techniques which learn from failure like SOAR and PRODIGY/EBL can generate transformations 
in response to incomplete solution attempts. Unfortunately it is difficult to generate reliable utility 
estimates from partial solution traces. For example, it may be the case that a transformation behaves 
very differently as problem solving proceeds. By prematurely terminating the planner, we may not 
get a representative sample of the transformations behavior. 

4 COMPOSER 

We have identified a number of the challenges facing an adequate learner. In this section we describe 
in some detail a particular approach to this problem, the COMPOSER system. An important property 
of this technique is that it makes its simplifications explicit. It implements a preference criteria based 
on reducing planning time. The utility of transformations is explicitly estimated and the system 
adopts transformations which, with high probability, have positive utility. The approach is probabi- 
listically adequate, where the probability of error is arbitrary and determined by the user. 

COMPOSER is currently implemented with control rules as its vocabulary of transformations. A 
number of simplifications are adopted to approach the problem. The technique incrementally builds 
strategies one transformation at a time. The system only considers adding new control rules to the 
end of the existing list. Thus, alternative orders for each transformation are ignored (Tl). Only a sub- 
set of legal transformations are considered, namely rules which are proposed in response to planning 
impasses (T3). This reduced space of transformations is explored by hill-climbing (T3). Multiple 
transformations are estimated simultaneously (El). Utility estimates are based on a statistical model 
and error is bounded to a user specified value (E3). The technique gathers data through self-solution 
(01). If problem solving is not feasible, system terminates without adopting any transformations. 

COMPOSER can be viewed as an rigorous version of the utility analysis approach of PRODIGY/EBL. 
While our implementation is an augmentation of PRODIGY/EBL, the basic approach can be ex- 
tended to a wide range of planners and transformation vocabularies.* 
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4.1 Gathering Observations 

COMPOSER uses the mechanisms of PRODIGY/EBL to propose control rules for the PRODIGY plan- 
ner. The PRODIGY planner is a STREPS-like planner which generates an annotated record of prob- 
lem solving. The problem trace describes the resources spent at each node, including time sp*,nt 
evaluating control rules. The PRODIGY/EBL's learning module analyzes this trace and conjectures 
possible control rules. 

In PRODIGY/EBL, conjectured rules are added to the planner and undergo a heuristic form of utility 
analysis. Rules which fail this analysis are retracted COMPOSER replaces this heuristic analysis 
with a rigorous alternative. Conjectured transformations are placed on a pending list of rules. The 
preconditions of pending rules are evaluated and the match cost recorded. However, the recommen- 
dations of pending rules are not performed. Instead, the system annotates the problem trace with 
the changes they would have introduced In this way, estimated rules do not influence the behavior 
of the planner which enables data to be gathered on multiple rules from a single solution trace. The 
specific process is fairly involved and is described in [Gratch90a]. The extracted utility values are 
conditional on the current set of transformations adopted by the system. 

4.2 Estimating Utility 

Utility values from several problems can be combined to estimate the utility of a transformation 
across the distribution of problems. The system should only adopt transformations which have posi- 
tive utility. Additionally, the system should remove from the pending list any transformations with 
negative utility. In statistics this is referred to as a sequential analysis problem. Observations are 
gathered until some criteria, a stopping criteria, is satisfied. In this case we are estimating the utility 
of transformations to some specified confidence. We require the user to provided an error parameter, 
8, which specifies the acceptable probability of incorrectly adopting a transformation. 1 

COMPOSER must choose among two hypotheses for any rule on the pending list: 

H 0 : AUTILITY(rulelplanner) <- 0, or Hi: AUTDLITY(rulelplanner) > 0 
Averaging the observations across problems yields an estimate of the true utility. This estimate will 

differ from the true value, so the system must bound the discrepancy. In particular, if the rule is nega- 
tive, the system must bound the probability that it will appear positive and vice versa. This is equiva- 

1. Alternatively we could require that 1 - 8 represent the confidence that every step makes progress. This requires 
determining a 8; at each step such that the sum of all 5fs equals 8. 
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lent to bounding the probability that the magnitude of the true utility is exceeded by the difference 
between true and estimated utility: 

P( ESTIMATE - UTTLITYI > IUTTLITYI ) - 5 
N£das [Nadas69] describes a distribution-free stopping criteria which can be applied. 2 It solves 
the more general problem of bounding the probability that an unknown mean is in an interval of size 
proportional to the mean. In our case we bound the unknown mean to an open interval which is ex- 
actly the size of the unknown mean. The technique requires gathering M examples where M is de- 
fined as: 

M-min^i {n: (y n /X) 2 <n(p/a) 2 } 
where V n - £ -X) 2 , X is the average utility, Xi is the utility on problem i, and p is the proportional 
parameter. In our case p is close to one (0.9999). The parameter a satisfies the constraint that <3>(a) 
- 8/2, where O is the cumulative distribution function of the standard normal distribution. 

43 Learning 

The stopping criteria permits COMPOSER to identify beneficial transformations with high probabil- 
ity. After each problem solving attempt, COMPOSER updates the statistics and evaluates the stop- 
ping criteria for each element of the pending list. If no control rule has attained the confidence re- 
quirement, another problem is solved. If the stopping rule identifies control rules with positive 
utility (there may be more than one), COMPOSER adds the control rule with highest positive utility 
to the current strategy, and removes it from the pending list. Statistics for the remaining pending 
rules are discarded as they are meaningless in the context of the resulting control strategy. If the 
stopping criteria identifies pending rules with negative utility, they are eliminated from the pending 
list. Eliminating a pending rule does not affect the current strategy, so the statistics associated with 
the remaining pending rules are left unchanged. This cycle is repeated until the training set is ex- 
hausted. Each time a transformation is adopted the efficiency of the PRODIGY planner is increased, 
giving COMPOSER an anytime behavior [Dean88]. 

2. The method is limited to distributions with a finite variance and provides an approximate confidence interval. 
Wbodroofe provides second-order results that this approximation is very close in practice [Woodroofe82], Greiner 
and Cohen [Greiner92] provide an alternative stepping rule which provides somewhat stronger guarantees at the cost 
of many more training examples. Far example, in the domains we have tested, N£das' technique requires on the order 
of ten training examples to accurately estimate the utility of a transformation. In the same domains, Greiner and Co- 
hen's method requires several thousand training examples per transformation, 
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4.4 Evaluating COMPOSER 

COMPOSER was tested on a domain from [Minton88], a domain in [Etzioni90a] for which PRODI- 
GY/EBL produced harmful strategies, and a domain in [Gratch91b] which yielded detrimental re- 
sults for both STATIC and PRODIGY/EBL. The results are summarized in Figure 2. The confidence 
for adding a transformation was set at 95%. In each domain the system is trained on a 100 training 
examples drawn according to a fixed distribution. Snapshots are taken at designated intervals to pro- 
vide a learning curve (the complete procedure is described in [Gratch90a]). The graphs illustrate 
learning curves where the independent measure is the number of random training examples and the 
dependent measure is execution time for 100 test problems. More effective strategies have lower 
solution times. We provide PRODIGY without learning and PRODIGY/EBL as benchmarks for com- 
parison. 




STRIPS 




0 10 20 304O6O60 708O0O100 

# of training examples 



0 10 2O304OBO6O70 8O90100 

# of training examples 



BIN-WORLD 
4 M M 

0 10 20 SO 40 60 60 70 80 90100 

t of training examples 



DOMAIN 


A COMPOSER 


X PRODIGY/EBL 


No Learning 


Rules 

Learned 

(average 


Learning 
Time 
1 (average) 


Solution 
Time 
(average) 


Rules 

Learned 

(average) 


Learning 
Time 
(average) 


Solution 

Time 
(average) 


Solution 

Time 
(average) 


AB -WORLD 


1 


1663 sec. 


208 sec. 


11 


1252 sec. 


331 sec. 


268 sec. 


STRIPS 


4 


4139 sec. 


357 sec. 


20 


3773 sec. 


673 sec. 


2362 sec. 


BIN-WORLD 


0 


3425 sec. 


346 sec. 


2 


6383 sec. 


6020 sec. 


346 sec. 



Figure 2: Summary of empriical results 

The results illustrates several interesting features. On all domains COMPOSER exceeded the per- 
formance of PRODIGY/EBL, including domains where PRODIG Y/EB L is inadequate. This is signifi- 
cant as both systems investigate the same space of transformations. An important result is that the 
intermediate planning times resulting from Composer are monotonically decreasing. This indicates 
that conditional utility is accurately estimated. A surprising fact is that in the domains where COM- 
POSER acquired a strategy, only one or two control rules account for most, if not all, of the savings. 
This suggests that most of the rules acquired by PRODIGY/EBL are; at best, superfluous. 
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6 CONCLUSIONS 



Learning shows great promise to extend the generality and effectiveness of planning techniques. 
But if learning is to be useful, we must explicitly characterize the properties of these systems. This 
article introduces the notion of adequacy to assess the merits of learning techniques. Surprisingly, 
many learning systems are not even minimally adequate in that they may worsen planning perform- 
ance. 



The complexity of learning makes unconstrained techniques infeasible. But the task can be ap- 
proached by introducing one or more simplifications. We discussed how many learning techniques 
can be viewed as implicitly adopting these simplifications. In many cases these constraints are only 
approximately satisfied with the result that these systems are not even minimally adequate. This 
highlights the need for explicit examination of the assumptions underlying new learning systems, 
as well as the need for analytical as well as empirical justification in future research. We discussed 
the COMPOSER system as one approach which is explicitly justified. 
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